Provable convergence of Nesterov’s accelerated gradient method for over-parameterized neural networks
نویسندگان
چکیده
Momentum methods, such as heavy ball method (HB) and Nesterov’s accelerated gradient (NAG), have been widely used in training neural networks by incorporating the history of gradients into current updating process. In practice, they often provide improved performance over (stochastic) descent (GD) with faster convergence. Despite their empirical success, a theoretical understanding convergence rates is still insufficient. Recently, some attempts made analyzing trajectories gradient-based methods an over-parameterized regime, where number parameters significantly larger than that instances. However, majority existing work mainly concerned GD established result NAG inferior to HB GD, which fails explain practical success NAG. this paper, we take step towards closing gap randomly initialized two-layer fully connected network ReLU activation. fact objective function non-convex non-smooth, show converges global minimum at non-asymptotic linear rate (1−Θ(1/κ))t, κ>1 condition gram matrix t iterations. Compared (1−Θ(1/κ))t our provides guarantees for acceleration training. Furthermore, findings suggest similar rate. Finally, extensive experiments on six benchmark datasets conducted validate correctness results.
منابع مشابه
Deterministic convergence of conjugate gradient method for feedforward neural networks
Conjugate gradient methods have many advantages in real numerical experiments, such as fast convergence and low memory requirements. This paper considers a class of conjugate gradient learning methods for backpropagation (BP) neural networks with three layers. We propose a new learning algorithm for almost cyclic BP neural networks based on PRP conjugate gradient method. We then establish the d...
متن کاملConvergence of Gradient Method for Double Parallel Feedforward Neural Network
The deterministic convergence for a Double Parallel Feedforward Neural Network (DPFNN) is studied. DPFNN is a parallel connection of a multi-layer feedforward neural network and a single layer feedforward neural network. Gradient method is used for training DPFNN with finite training sample set. The monotonicity of the error function in the training iteration is proved. Then, some weak and stro...
متن کاملA unified convergence bound for conjugate gradient and accelerated gradient∗
Nesterov’s accelerated gradient method for minimizing a smooth strongly convex function f is known to reduce f(xk) − f(x∗) by a factor of ∈ (0, 1) after k ≥ O( √ L/` log(1/ )) iterations, where `, L are the two parameters of smooth strong convexity. Furthermore, it is known that this is the best possible complexity in the function-gradient oracle model of computation. The method of linear conju...
متن کاملConvergence of Gradient Method with Momentum for Back-propagation Neural Networks
Wei Wu Department of Applied Mathematics, Dalian University of Technology, Dalian 116024, China Email: [email protected] Naimin Zhang Mathematics and Information Science College, Wenzhou University, Wenzhou 325035, China Email: [email protected] Zhengxue Li and Long Li Department of Applied Mathematics, Dalian University of Technology, Dalian 116024, China Email: [email protected], long ...
متن کاملConvergence of Online Gradient Method for Pi-sigma Neural Networks with Inner-penalty Terms
This paper investigates an online gradient method with innerpenalty for a novel feed forward network it is called pi-sigma network. This network utilizes product cells as the output units to indirectly incorporate the capabilities of higherorder networks while using a fewer number of weights and processing units. Penalty term methods have been widely used to improve the generalization performan...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: Knowledge Based Systems
سال: 2022
ISSN: ['1872-7409', '0950-7051']
DOI: https://doi.org/10.1016/j.knosys.2022.109277